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Chapter 3 Higher order derivatives 


You certainly realize from single-variable calculus how very important it is to use derivatives 
of orders greater than one. The same is of course true for multivariable calculus. In particular, 
we shall definitely want a “second derivative test” for critical points. 


A. Partial derivatives 


First we need to clarify just what sort of domains we wish to consider for our functions. 
So we give a couple of definitions. 





DEFINITION. Suppose zo € A C R”. We say that 29 is an interior point of A if there 
exists r > 0 such that B(ao,r) C A. 


On 


DEFINITION. A set A C R” is said to be open if every point of A is an interior point of 
A. 











EXAMPLE. An open ball is an open set. (So we are fortunate in our terminology!) To see 
this suppose A = B(xz,r). Then if y € A, |x — y|| <r and we can let 

B(a,r) 
e=r—|e—yl. 


Then B(y,¢) C A. For if z € B(y,e), then the triangle inequality implies 


Iz-al| <|lz—yll + lly-all 
< e+ lly—all 3 


= Ts 


so that z € A. Thus B(y,e) C B(x,r) (cf. Problem 1-34). This proves that y is an interior 
point of A. As y is an arbitrary point of A, we conclude that A is open. 


PROBLEM 3-1. Explain why the empty set is an open set and why this is actually a 
problem about the logic of language rather than a problem about mathematics. 











PROBLEM 3-2. Prove that if A and B are open subsets of R", then their union 
AUB and their intersection AM B are also open. 
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PROBLEM 3-3. Prove that the closed ball B(z,r) is not open. 











PROBLEM 3-4. Prove that an interval in R is an open subset <=> it is of the form 
(a,b), where —co <a <b< oo. Of course, (a,b) = {x € Rla < x < Dd}. 





























PROBLEM 3-5. Prove that an interval in R is never an open subset of R?. Specifically, 
this means that the x-axis is not an open subset of the «—y plane. More generally, prove 
that if we regard R* as a subset of R", for 1 < k < n —1, by identifying R* with the 
Cartesian product R* x {0}, where 0 is the origin in R"~*, then no point of R* is an 
interior point relative to the “universe” R”. 































































































NOTATION. Suppose A C R” is an open set and A -, R is differentiable at every point 
x € A. Then we say that f is differentiable on A. Of course, we know then that in particular 
the partial derivatives Of /0x,; all exist. It may happen that Of/Ox, itself is a differentiable 
function on A. Then we know that its partial derivatives also exist. The notation we shall 
use for the latter partial derivatives is 


Of — Oa (of 
O05; 7 Ox; Ox; ; 


Notice the care we take to denote the order in which these differentiations are performed. In 


case 7 = 7 we also write 
ay oF .. oO (5) 








ax2 Ox,0x; Ox; \ Ox; 
By the way, if we employ the subscript notation (p. 2-20) for derivatives, then we have logically 


u 


O77 
Ox,02; ; 





Fejx; = (facy ) as a 


Partial derivatives of higher orders are given a similar notation. For instance, we have 


oF 
3 2 — Feexiens: 
Ox? OyOz?0x 
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PROBLEM 3-6. Let (0,00) xR , R be defined as 
f(t,y) = 2. 


Compute Tie pre Fue Fyy- 


PROBLEM 3-7. Let R? 4 R be defined for x # 0 as f(x) = log ||z||. Show that 


a ns a 


ale ire a eee 
Of, Ox 


This is called Laplace’s equation on R?. 


PROBLEM 3-8. Let R" 4 R be defined for « # 0 as f(z) = ||z||?-". Show that 


(Laplace’s equation on R".) 


PROBLEM 3-9. Let R © R be a differentiable function whose derivative y’ is also 
differentiable. Define R? 4 R as 


f(x,y) = 9(@ + y). 


Show that 
ay 7, _ 


an? ay? 
Do the same for g(x,y) = v(x — y). 





Now suppose A C R” is open and A , R is differentiable at every point x € A. Then we 
say f is differentiable on A. The most important instance of this is the case in which all the 
partial derivatives Of /Ox,; exist and are continuous functions on A — recall the important 
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corollary on p. 2-32. We say in this case that f is of class C! and write 
fec'(A). 
(In case f is a continuous function on A, we write 
f €C%(A).) 
Likewise, suppose f € C'(A) and each Of /Ox; € C'(A); then we say 
f €C*(A). 
In the same way we define recursively C*(A), and we note of course the string of inclusions 
mec Areo Ae CA cols. 


The fact is that most functions encountered in calculus have the property that they have 
continuous partial derivatives of all orders. That is, they belong to the class C*(A) for all k. 
We say that these functions are infinitely (often) differentiable, and we write the collection of 
such functions as C%(A). Thus we have 


co~(A) = ()C*(A) 


and 
CAC ECOCACCAeC™m. 


PROBLEM 3-10. Show that even in single-variable calculus all the above inclusions 
are strict. That is, for every k show that there exists f € C*(IR) for which f ¢ C**1(R). 


























We now turn to an understanding of the basic properties of C?(A). If f € C?(A), then we 
have defined the “pure” partial derivatives 


of — 0 (af 


as well as the “mired” partial derivatives 


af — a (oar 
Ono: 7 Ox; Ox; : 
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for i ~ j. Our next task is the proof that if f € C?(A), then 


O° f OF 


Ox,OX ; 7 Onn: 


(“the mixed partial derivatives are equal”). This result will clearly render calculations involv- 
ing higher order derivatives much easier; we'll no longer have to keep track of the order of 
computing partial derivatives. Not only that, there are fewer that must be computed: 











PROBLEM 3-11. If f € C?(R?), then only three second order partial derivatives of 
f need to be computed in order to know all four of its second order partial derivatives. 
Likewise, if f € C?(R*), then only six need to be computed in order to know all nine. 
What is the result for the general case f € C?(R”)? 





























What is essentially involved in proving a result like this is showing that the two limiting 
processes involved can be done in either order. You will probably not be surprised, then, that 
there is pathology to be overcome. The following example is found in just about every text 
on vector calculus. It is sort of a canonical example. 

















PROBLEM 3-12. Define R? 4 R by 





Prove that 














a. f € C'(R’), 
Oy; OF OF OT 
" Ox?’ AxOy’ OyOx’ Oy? 


cad Oy 
. Bady | We pupa 





2 














exist on 


A 








All pathology of this nature can be easily eliminated by a reasonable assumption on the 
function, as we now demonstrate. 
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THEOREM. Let A C R” be an open set and let f € C?(A). Then 
of OF 


Ox,O2 ; i Ox ,;02; ; 





PROOF. Since we need only consider a fixed pair 7, 7 in the proof, we may as well assume 
7 =1,7=2. And since x3,...,2, remain fixed in all our deliberations, we may also assume 
that n = 2, so that AC R?. 

Let x € A be fixed, and let 6 > 0 and € > 0 be arbitrary but small enough that the points 
considered below belong to A (remember, x is an interior point of A). 

The first step in our proof is the writing of a combination of four values of f near the point 
x. Then we shall perform a limiting procedure to obtain the desired result. Before defining 
the crucial expression we present the following guideline to our reasoning. Thus we hope that 
approximately 





Of J G03) = fai, es) 


= (#iy Pa) od 5 ’ 


Ov, 
oF oy, 20 +6) Flar + 6, %2 + €) — F(t, 42 + €) 
1 








2 


Now subtract and divide by e: 














0 0 
Of (a e Z 5 (a1, 22 + €) = = (a1, 22) 
Axr,0%," °°” € 
f(x1t6,z2+6)—f(a1,z2t+€) _ f(v1+d,02)—f(r1,02) 
~ 6 6 
€ 


Thus we hope that 


2 
= ul we 0 te" [f(y +4,02 +6) — f (1,22 +6) — f(a + 6,42) + (a1, 22). 
toO2} 





Now we come to the actual proof. Let 
D(d,€) = f(tit+6,22+6€) — f(t1, 22+) — f(t1 + 6,22) + f(£1, 22). 
Then define the auxiliary function (this is a trick!) of one real variable, 


g(y) = f(a1 +4,y) — f (41, y). 
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By the mean value theorem of single-variable calculus, 

g(x2 + €) — g(r2) = €g'(t) 
for some %g <t < 22+ €. That is, 


7 Of OF 
D(6,€) = € By te 5 OR 


Now we use the mean value theorem once again, this time for the term in brackets: 








Of Of _ O f 
eon ~ aa = 0 aoe ’ ) 
for some 41 < s < 2,+ 0. The result is 
oy 
D(6,€) = de Dis (s, t). 


Here is a diagram illustrating this situation: 














- + 
(KX, X,+E) @ ® (x, +§, X,+€ ) 
(x,t)  }------------------------------ ee (x,+8, t) 
(s,t ) 
(s', t') 
e 

(X1,%)) ¢ r) 

+ — (x)+6, x, ) 


Now carry out the very same procedure, except with the roles of x; and x2 interchanged. 
The result can be immediately written down as 


02 f 


= / 4! ; 
D(6,€) de aL ) 
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Comparing the two equations we have thus obtained, we find after dividing by de, 


O° f coe O° f (s/,t’) 
0x1 0X2 . 7 0x02 1 Pa ; 








Finally, we let « — 0 and at this time use the continuity of the two mixed partial derivatives 
to obtain 
of of 


0x1 022 RUE ee Ox202 ae 








QED 


REMARK. The proof we have presented certainly does not require the full hypothesis that 
f € C?(D). It requires only that each of 0? f/0r,;0x; and 0? f /Ox;0x; be continuous at the 
point in question. As I have never seen an application of this observation, I have chosen 
to state the theorem in its weaker version. This will provide all we shall need in practice. 
However, the weakened hypothesis does indeed give an interesting result. Actually, the same 
proof outline provides even better results with very little additional effort: 


PROBLEM 3-13*. Assume only that 0f/Ox; and Of/Ox; exist in A and that 
0? f /Ox,0x; exists in A and is continuous at x. Prove that 0? f /Ox 0x; exists at x and 
that 





a = of at £ 
OLj{00; 7 OL; Ou; ; 
(HINT: start with the equation derived above, 
Oy 


6 te 'D(d,e) = 





0x10X2 (s; t). 


The right side of this equation has a limit as 6 and € tend to zero. Therefore so does the 
left side. Perform this limit in the iterated fashion 


lim (lim (left hand side) ) 


Ee 


to obtain the result.) 
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PROBLEM 3-14*. Assume only that 0f/0x; and Of /Ox,; exist in A and are themselves 
differentiable at x. Prove that 
OF oO" f 


= at 2x. 


Ox,O2 ; Or07; 





(Hint: start with the expression for D(é,¢) on p. 3-6. Then use the hypothesis that 
Of /Oxz is differentiable at x to write 


of ei Oy of 
Da + é, t) = Dang PE 2) + Dn day tn)? + Dag TH Padlt = x2) 
+ small multiple of (6 + (t — 22)) 
of _ Of O° f 
pe = Dany PE 2) + Dag (TH alt =a) 


+ small multiple of (¢ — x2). 


Conclude that 


2 


O° f 
D = <a i 
(0d, €) €0 a +e (small multiple of (6 + €)) 


and carefully analyze how small the remainder is. Conclude that the limit of e~?D(e, €) 
as € tends to zero is the value of 0? f/O02,0x2 at (x1, 22). 


PROBLEM 3-15. Prove that if f is of class C* on an open set A, then any mixed par- 
tial derivative of f of order k can be computed without regard to the order of performing 
the various differentiations. 


B. Taylor’s theorem (Brook Taylor, 1712) 











We are now preparing for our study of critical points of functions on R”, and we specifically 
want to devise a “second derivative test” for the type of critical point. The efficient theoretical 
way to study this question is to approximate the given function by an expression (the Taylor 
polynomial) which is a polynomial of degree 2. The device for doing this is called Taylor’s 
theorem. 

We begin with a quick review of the single-variable calculus version. Suppose that g is a 
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function from R to R which is defined and of class C? near the origin. Then the fundamental 
theorem of calculus implies 


Integrate partially after a slight adjustment: 


ay = 4) i Jidi~s) Wieteed 


s=t 


g(0) — g'(s)(t— s) + f al(oy(t—syas 





s=0 


g(0) + aot ff o"(s)(t—s)ds. 


Now we slightly adjust the remaining integral by writing 
a!(s) = 9"(0) + (as) —9"(0)) 


and doing the explicit integration with the g’(0) term: 


[ stoe— ous = a fe syaer 


t— 2 |s=t 
Se Le 
= s=0 





= g'(0) 





1 
= 59" (0)t” + R. 


Here R, the “remainder,” is of course 


t 
[ (v'@-s'@) t= sas. 
0 
Suppose 
\g"(s) — g’(0)| < Q > for s between 0 and t. 
Then 
: 1 
In| <| [ @ct—s)as| = See 
0 2 


(The absolute value signs around the integral are necessary if t < 0.) So the version of Taylor’s 
theorem we obtain is 


g(t) = 9(0) + 9'O)t + 50"(O)P? +R, 
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where 


i 
|R| < 5 Ot. 


(By the way, if g is of class C?, then |g’(s) — g"(0)| < C|s| for some constant C (Lipschitz 
condition), so that 
t 
1 
IR| < | | Cs(t—s)ds |= =Clel. 
0 


Thus the remainder term in the Taylor series tends to zero one degree faster than the quadratic 
terms in the Taylor polynomial. This is the usual way of thinking of Taylor approximations. ) 











Now we turn to the case R” & R, and we work first with the Taylor series centered at the 
origin. (We can easily extend to other points later.) We assume f is of class C? near 0. If 
x € R” is close to 0, we use the familiar ploy of moving between 0 and «x along a straight line: 
we thus define the composite function 














g(t) = f(tz), 
h- x for 0<t<1 


regarding x as fixed but small. We then have as above, with t = 1, 


g(1) = 9(0) +900) + 590) +R, 


where 


1 


where 
lg’(s) —g"(0)| < Q for 0<s<1. 


Now we apply the chain rule: 


(D;f = Of /Ox;) and again 


3 
3 


S"(D;Dif) aba 


i=1 j=l 
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In particular, 
(0) = > Dif (0), 
i=1 
g'(0) = 3 DD; f(0)xix;. 
ij=l 


We now see what the statement of theorem should be: 











THEOREM. Let R" & R be of class C? near 0. Let € > 0. Then there exists 6 > 0 such 
that if ||z|| < 6, then 


Fle) = F0) + D2 Dif(O)ai + 5 D> D:D, FlO)aans + R, 
i=1 


i,j=l 


where 
IR] < lll. 


PROOF. We continue with the above development. Because f is of class C7, its second 
order partial derivatives are all continuous. Hence given € > 0, we may choose 6 > 0 such that 
for all ||y|| < 6, 

IODif iy) = DDiFO)| Se 


Then if ||xz|| <6, we have a pretty crude estimate valid for 0 < s < 1: 


lg"(s) _ g’ (0)| = | De, (DD; f (sa) = D,D;f(0)) Bs 
‘j=l 
< Sele [ay 
ij=l 


< € » 1- » a; (Schwarz inequality) 
i=l i=l 


= en||z|?. 
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The remainder term above therefore satisfies 
1 
IR] -< Sena? 


To obtain the final result, first replace € by 2€/n. 
QED 


B’. Taylor’s theorem bis 


There is actually a much different proof of Taylor’s theorem which enables us to weaken 
the hypothesis significantly and still obtain the same conclusion. As far as applications to 
calculus are concerned, the C? version we have proved above is certainly adequate, but the 
improved version presented here is nevertheless quite interesting for its minimal hypothesis. 





THEOREM. Let R” & R be differentiable in a neighborhood of 0, and assume each partial 
derivative Of /Ox; is differentiable at 0. Let « > 0. Then there exists 6 > 0 such that if 
I|x|| < 6, then 

















Fla) = f0) +) Dsf(O)as + 5 Y> D:D, FlO)asa; + R, 
i=1 


ij=l 
where 
|R| < ella’. 


PROOF. We still use the function g(t) = f(tx), and we still have the chain rule 
of t) = 7 Dif (tx). 
i=l 


However, now we use the hypothesis that D;f is differentiable at 0 as follows: for any € > 0 
there exists 6 > 0 such that 


Dif (y) = Dif (0) + » D; Dif (O)y; + Rly), 


where 
IRi(y)| <e |lyl| — for all |ly|| <6. 
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Therefore, if ||x|| <6 and 0 <t < 1, we obtain 
g(t) = S-Di f(0)%, + S> S> D; Dif (O)tajx; + Rt, 2), 
i=l i=1 j=l 


where 
IR(t,2)| = | )oaR,(tz) | 
i=1 


nm 

<S° lailet ||2"| 
i=1 

< etn |||. 


At this point we would like to apply a theorem from single-variable calculus known as the 
Cauchy mean value theorem. This is used in most proofs of l’Hopital’s rule. However, instead 
of appealing to this theorem we insert its proof in this context. Define 


v(t) = g(t) — 9(0) — 9'(0)t — [9) — 9(0) — go O)E, 


noting that y(0) = y(1) = 0 and that ¢ is differentiable for 0 < t < 1. Thus the mean value 
theorem implies y/(t) = 0 for some 0 < t < 1. That is, 


[9(1) — 9(0) — g'(O)]2t = g(t) — 9'(0) 


= 5° D,D;f(0)ta;e; + R(t, 2). 
ij=l 


Dividing by 2t gives 


R(t, x) 
Ze. 





n 1 n 
fo) =f0) = d. Da = 3 a D, Dif (0) aga + 
where 


A) | Ei al? 


QED 
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PROBLEM 3-16. Here’s a simplification of the proof of the theorem in case n = 1. 


Calculate 
sa Le) = £00) = FO)e 


x—0 2 





and then deduce the theorem. 


The hypothesis of this version of Taylor’s theorem is rather minimal. It is strong enough 
to guarantee the equality of the mixed second order partial derivatives of f at 0, as we realize 
from Problem 3-14 (although we did not require this fact in the proof). The following problem 
shows that there is little hope of further weakening the hypothesis. 


PROBLEM 3-17. Let f be the “canonical” pathological function of Problem 3-12. 
This function is of class C! on R? and has the origin as a critical point. Show that there 
do not exist constants C,, C2, cz; such that 

















f(a,y) = ax? + xy + cay” + R(z, y) 


and 2 
lim By) = 
(x,y)>(0,0) x* + y 





C. The second derivative test for R? 


To get started with this analysis, here is a problem which you very much need to investigate 
carefully. 
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PROBLEM 3-18. Let A, B, C be real numbers such that 


A>0. Cst, AC=P2? st 














a. Prove that a number \ > 0 exists such that for all (x,y) € R’, 


Az? +2Bry+Cy? > Xa? +y?"). 


b. Calculate the largest possible A. 





(Answer: 42° (42) (AC — B?)) 





Now we are ready to give the second derivative test. We suppose that we are dealing 











with a function R? 4 R which is of class C? in a neighborhood of the origin (we shall later 
immediately generalize to any point). Using coordinates (x, y) for R?, we obtain from Taylor’s 
theorem in Section B 





——’ a 
0 order term 1storder terms 
1 
= 5) (Duf (0, O)a* - 2D 2f (0, O)ay = Doo f (0, 0)y*) = R(x, y), 
a ——— 
2ndorder terms remainder term 


and the remainder term satisfies the condition that for any « > 0 there exists 6 > 0 such that 
vty <0 = |R(z,y)| < ea? +y’). 


(Here we have continued to employ the notation D1, f = 0? f /Ox? etc.) 
Now we suppose that (0,0) is a critical point for f. That is, by definition, 


Let us now denote 
A= Dy, f(0,0) = 6? f /Ox?(0, 0), 
B= Di2f (0,0) = 0° f/Axdy(0,0), 
C= Do f (0, 0) = & f dy" (0, 0). 


Then the Taylor expression for f takes the form 


F(esy) = (0,0) + 5 (Ax? + 2Bay + Cy?) + R(x,y). 
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The underlying rationale for the second derivative test is that in the Taylor expression for 
f the first order terms are absent, thanks to the fact that we have a critical point; therefore 
we expect that the second order terms will determine the behavior of the function near the 
critical point. In fact, this usually happens, as we now show. 

Let us first assume that the numbers A, B, C' satisfy the inequalities 


A>0O and C>0O, 
AC — B? > 0. 
We know from Problem 3-18 that there exists a positive number A (depending only on A, B, 


C) such that 
Ax’? + 2Bry+ Cy? > (a? +y’) for all (x,y) € R’. 


Therefore, 27+ y? <6? => 
F(e,y) = (0,0) + Ae? +9) - |R, 9) 
> f(0,0) + 5a? + 92) — ela? + 92) 
= £(0,0) + (5A-e) @? +9?) 


This inequality is exactly what we need! For we may choose 0 < € < 5A and then conclude 
with the corresponding 6 that 


O<a?+y’ <6 = f(z,y) > f(0,0). 


The conclusion: at the critical point 0, f has a strict local minimum. (“Strict” in this 
context means that near (0,0) the equality f(x,y) = f(0,0) holds only if (x,y) = (0,0).) 
If we assume instead that 


A<0O and C<0O, 
AG — Bs, 


then we obtain the corresponding conclusion: at the critical point 0, f has a strict local 
maximum. The proof is immediate: we could either repeat the above type of analysis, or 
we could simply replace f by —f and use the previous result. For replacing f by —f also 
replaces A, B, C' by their negatives, so that the assumed condition for f becomes the previous 
condition for —f. 
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Finally, assume 
AC — B’ <0. 


The quadratic form in question, Av? + 2Bxry + Cy”, can now be positive at some points, 
negative at others. We check this quickly. If A 4 0, then the equation Ax? + 2Bry + Cy? = 0 


has distinct roots 

x —-B+V/B?— AC 

y A 
and therefore changes sign. The reason is clear. Renaming z/y as t, the function At? +2Bt+C 
has a graph which is a parabola (since A # 0). It hits the t¢ axis at two distinct points, so 
that its position is that of a parabola that genuinely crosses the t-axis. Likewise if C 4 0. If 
both A and C are 0, then we are dealing with 2Bay, which has opposite signs at (say) (1,1) 
and (1,—1). (Notice that our assumption implies B ¥ 0 in this case.) 

Now suppose (29, yo) satisfies 





Axe + 2Broyo + Cyg =a < 0. 


Then for small |t] we have 
1 2 
f (txo, tyo) => f(0, 0) + gut + R(txo, tyo) 
1 
< f(0,0) + ot” + et? (x2 + Yo) 
1 
= f(0,0)+# (50 + €(x3 + ia)) : 


Thus if € is small enough that $a + e(x§ + yg) < 0 and |¢| is sufficiently small and not zero, 


f (tx, tyo) < f(0, 0). 


Thus in any neighborhood of (0,0) there are points where f < f(0,0). 

In the same way we show that in any neighborhood of (0,0) there are points where f > 
f (0,0). The conclusion: at the critical point 0, f has neither a local minimum nor a 
local maximum value. 

To recapitulate, all the above analysis is based on the same observation: at a critical point 
(0,0), the behavior of f is governed by the quadratic terms in its Taylor expansion, at least in 
the three cases we have considered. These three cases are all subcategories of the assumption 


AC — B? £0. 
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For if AC — B? > 0, then AC > 0 and we see that A and C are either both positive or both 
negative. If AC — B? <0, that is the third case. 

The conclusion is that, if AC — B? 4 0, then the quadratic terms in the Taylor expansion 
of f are determinative. If AC — B? = 0, we have no criterion. 

For example, the function 


f(gy=a' ty 


has the origin as a strict local minimum; its negative has the origin as a strict local maximum. 
And the function 


f(ey)=2°-y! 


has a critical point at the origin, which is neither a local minimum nor a local maximum. All 
these examples satisfy AC — B? = 0. 


PROBLEM 3-19. Find three functions f(x,y) which have (0,0) as a critical point 
and have AC = B = 1 (so that AC — B? = 0) such that the critical point (0,0) is 

a. a strict local minimum for one of them, 

b. a strict local maximum for another, 


c. neither a local minimum nor a local maximum for the other. 


D. The nature of critical points 

















This section contains mostly definitions. We suppose that R” /, R and that f is of class 
C' near a critical point 29. Thus Vf (zo) = 0. There are just three types of behavior for f 
near 20: 
f has a local minimum at 2: 
f(x) > f(xo) for all x € B(xo,€) for some € > 0. 
f has a local maximum at Zo: 
f(x) < f(xo) for all x € B(xo,€) for some € > 0. 
f has a saddle point at xo: 


neither of the above two conditions holds — in other words, for all ¢ > 0 there exist x, 
x’ € B(ao,¢€) such that f(x) < f(xo) < f(z’). 


There is a further refinement in the first two cases: 
f has a strict local minimum at 29: 
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f(x) > f(a) for all 0 < ||x — x|| < € for some € > 0. 
f has a strict local maximum at 9: 
f(x) < f(x) for all 0 < ||x — xo|| < € for some € > 0. 
In Section C we analyzed the R? case when the critical point was the origin. That is really 
no loss of generality in view of the correspondence between the function f and its translated 
version g given by 








g(x) = f(to+z), xc ER". 
For f has a critical point at x79 = = g has a critical point at 0. Moreover, the nature of the 
critical points is the same. Moreover, 


so the information coming from the second derivatives is the same for f and g. 





SUMMARY. We take this opportunity to summarize the R? case. Assume 











R? 4 R is of class C? near Xo, and Zp is a critical point of f. 


Define 
A= D,D;f (20), 
B= D,D2f(29), 
C = D2Dof (xo). 
Assume 


AC — B? £0. 
Then we have the results: 


A>O and C>0O, 
=> f has a strict local minimum at Zp. 


AC — B? >0 


A<0O and C<Q, 
=> f has a strict local maximum at 79. 


AC =P? S06 


AG = Bb 20 => f has a saddle point at xo. 


In case AC — B? = 0, it is impossible to tell from A, B, C what the nature of the critical 
point is. Here are some easy examples for the critical point (0,0) in the x — y plane: 
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1. figjy) =x" local minimum 

2. fag) ==a local maximum (these are even global extrema) 
3. f(x,y) =2? + y'4 strict local minimum 

4. f(x,y) = —x? — y* strict local maximum 

5. f(z,y) =2? —y* saddle point 


In each of the next five problems determine the nature of each critical point of the indicated 
function on R?. 


1 1 
PROBLEM 3-20. f(x,y) = gt + 5v + 2Qey + 5a + y. 


(Answer: one local minimum, one saddle) 





PROBLEM 3-22. f(z, y) = zy(12 — 3a — 4y). 
PROBLEM 3-23. F(a,y) =a" + y* — 22? + 4ay — 2y’. 
PROBLEM 3-24. f(x,y) = (ax? + by2)e"” ” (a > b > 0 are constants). 


PROBLEM 3-25. The function of Problem 2-56 has just the critical point (1,1). Use 
the above test to determine its local nature. 


(Answer: local minimum) 


PROBLEM 3-26. The function of Problem 2-64 has just the critical point (1,0), and 
it was shown to be a local maximum. Use the above test to determine its local nature. 
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PROBLEM 3-27. Here’s a function given me by Richard Stong: 
f (v,y) = 240 + 6x7y? + 6a?y — 2? + xy’. 


Show that it has just one critical point, which is a strict local maximum but not a global 
maximum. (Stong’s point: this provides the same moral as does Problem 2-64, but with 
a polynomial function.) 


There is a beautiful geometric way to think of these criteria, in terms of level sets. For 














instance, suppose R? 4, R has a critical point at (0,0), and that A= 2, B=0, C =1, so we 
know the critical point is a minimum. If f were just a quadratic polynomial, then 





F(asy) = (0,0) +22 + 597. 


The level set description of f then is concentric ellipses. In case f is not a quadratic polynomial, 
then near the critical point the level sets will still resemble ellipses, but with some distortion. 


The distortion decreases as we approach the critical point. 
\ ] 
i 
| | 


VV 


Likewise, the level sets of a quadratic polynomial which is a saddle point are concentric 
hyperbolas, but in general they will resemble distorted hyperbolas. 
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Now we return to the general n-dimensional case, so we shall be assuming throughout this 
section that 


E. The Hessian matrix 





R°’ SR 








is a function of class C? near its critical point 2p. The Taylor expansion from Section B 
therefore becomes 


1 n 
f(to+y) = f(to) + 5 D, DiDj fo) yy; + R(y), 
2 ij=l 
where for any € > 0 the remainder satisfies 
IR(y)| < e |ly|l? if ||y|| is sufficiently small. 


We are going to try to analyze this situation just as we did for the case n = 2, hoping that the 
quadratic terms in the Taylor expansion are determinative. In order to begin our deliberations, 
we single out the essence of the quadratic terms by giving the following terminology. 


DEFINITION. The Hessian matrix of f at its critical point x is the n x n matrix 


oF 
# = (aos, ): 


In case n = 2 and in the notation of the preceding sections, 


H = é 0): 
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In case n = 1 we have simply 
H= f" (a0). 


Notice that in all cases the fact that f is of class C? implies that D;Djf (xo) = D;D;f (xo), in 
other words that H is a symmetric matrix. 


Here is a crucial concept: 


DEFINITION. In the above situation the critical point x9 is nondegenerate if det H # 0. 
And it is degenerate if det H = 0. 

Our goal is to establish a “second derivative test” to determine the local nature of a 
critical point. This will prove to be possible only if the critical point is nondegenerate. For 
n = 1 this means f”(xo) 4 0, and standard single-variable calculus gives a strict local minimum 
if f”(zo) > 0 and a strict local maximum if f”(a2o) < 0 (and no conclusion if f”(x%9) = 0). For 
n = 2 the nondegeneracy means AC — B? # 0, and the results are given on p. 3-20. 

To repeat: if the critical point is degenerate, no test involving only the second order partial 
derivatives of f can detect the nature of a critical point. We shall therefore restrict our analysis 
to nondegenerate critical points. 


REMARK. If n = 1 there is no such thing as a nondegenerate saddle point. Thus the 
structure of critical point behavior is much richer in multivariable calculus than in single- 
variable calculus. It’s a good idea to keep in mind easy examples of nondegenerate saddle 
points for n = 2, such as the origin for functions such as: 


f(x,y) = xy, 
or 
Fe) = a _ y’, 
or 


Jno) = e+ 5ayt+y?. 


Here is a simple but very interesting example which illustrates the sort of thing that can 
happen at degenerate critical points. 
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PROBLEM 3-28. Define 
f(x,y) =(@—y")(@ — 2y”). 
a. Show that (0,0) is the only critical point. 
b. Show that it is degenerate. 


c. Show that when f is restricted to any straight line through (0,0), the resulting 
function has a strict local minimum at (0,0). 


d. Show that (0,0) is a saddle point. 


We are going to discover that quadratic polynomials are much more complicated alge- 
braically for dimensions greater than two than for dimension two. 


PROBLEM 3-29. The quadratic polynomial below corresponds to a Hessian matrix 
1 -1 1 

-1 3 O 

1 @ 2 


Use whatever ad hoc method you can devise to show that 


x? + 3y? + 227 — Qry + 2rz > 0 





except at the origin. 


REMARK. Since the terms in the above expression are purely quadratic, the homogeneity 
argument you may have used for Problem 3-18 shows that there exists a positive number 
such that 

x? + 3y* + 22" — Qay + Qaz > A(x? + y? + 2”) 





for all (x,y,z) € R®. However, the problem of finding the largest such A is a difficult algebra 
problem, equivalent to finding a root of a certain cubic equation. We are very soon going to 
discuss this algebra. (The maximal \ is approximately 0.1206.) 

We have just defined nondegenerate critical points by invoking the determinant of the 
associated Hessian matrix. Before we proceed further, we need to pause to define determinants 
and develop their properties. 
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F. Determinants 


Every nxn matrix (every square matrix) has associated with it a number which is called its 
determinant. We are going to denote the determinant of A as det A. This section is concerned 
with the definition and properties of this extremely important function. 

Ifn =1, a1 1 matrix is just a real number, so we define det a = a. 

If n = 2, then we rely on our experience to define 


det A = det & | = 441092 — 212491. 


a21 422 


Some students at this stage also have experience with 3 x 3 matrices, and know that the 
correct definition in this case is 


Q11 412 413 
det A = det a2, Q@22 493 
431 432 433 


= 041022433 — 11423032 — A12021033 + A12023A31 + A13021032 — 413022031. 


Often this definition for n = 3 is presented in the following scheme for forming the six possible 
products of the a,;’s, one from each row and column: 


\ “dy Ag a3 the three indicated receive a plus sign, the three formed with the opposite slant a minus sign 


N 
a N 


ag) eg 


No such procedure is available for the correct definition for n > 3. However, meditation on 
the formulas n = 2 and n = 3 reveals certain basic properties, which we can turn into axioms 
for the general case. These properties are as follows: 


MULTILINEAR: det A is a linear function of each column of A; 
ALTERNATING: det A changes sign if two adjacent columns are interchanged; 
NORMALIZATION: det / =1. 
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(In the first two of these properties we could replace “column” by “row,” but we choose not 
to do that now. More on this later.) Incidentally, a function satisfying the first condition is 
said to be multilinear, not linear. A linear function would satisfy det(A+ B) = det A+ det B, 
definitely not true. In order to work effectively with these axioms it is convenient to rewrite 
a matrix A in a notation which displays its columns. Namely, we denote the columns of A as 
Ay, Ag,..., An, respectively. In other words, 


Then we present A in the form 


Here is the major theorem of this section. 

THEOREM. Let n> 2. Then there exists one and only one real-valued function det defined 
on alln x n matrices which satisfies 

MULTILINEAR: det(A; Ag ... An) ts a linear function of each Aj; 


ALTERNATING: det (A, pats Aj Ajs1 aay An) = det (A; ease Aj41A; fees An) 
for each 1l<j<n-1; 


NORMALIZATION: det J = 1. 
Rather than present a proof in our usual style, we are going to present a discussion of the 


ramifications of these axioms. Not only will we prove the theorem, we shall also in the process 
present useful techniques for evaluation of the determinant. Here’s an example. 


PROBLEM 3-30. Prove that if det satisfies the alternating condition of the theorem, 
then interchanging any two columns (not just adjacent ones) of A produces a matrix 
whose determinant equals — det A. 


PROBLEM 3-31. Prove that if two columns of A are equal, then det A = 0. 
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PROBLEM 3-32. Conversely, prove that if det is known to be multilinear and satisfies 
det A = 0 whenever two columns of A are equal, then A is alternating. 
(HINT: expand 

det(Ai --- (Aj + Aj41)(Ag + Aj41)Aj42° ++ An) 


as a sum of four terms.) 


Now we can almost immediately prove the uniqueness. It just takes a little patience with 
the notation. Rewrite the column A, in terms of the unit coordinate (column) vectors as 


n 
A; = ) O45 ej. 
7=1 


For each column we require a distinct summation index, so change 7 to 2; to get 


nm 
Aj = y agg Cage 


ij=1 
Now the linearity of det as a function of each column produces a huge multi-indexed sum: 
det A = det(A, Ap ... An) 
= S- Qi11-- + Ginn det(E;, ... &;,)- 


tytn 


In this huge sum each 7; runs from 1 to n, giving us n” terms. Now look at each individual 
term in this sum. If the indices 71,...,2, are not all distinct, then at least two columns of the 
matrix 

(€;, -.- in) 
are equal, so that Problem 3-31 implies the determinant is 0. Thus we may restrict the sum 
to distinct 11,...,%,. Thus, the indices 71,...,2, are just the integers 1,...,n written in some 
other order. Thus the matrix 

(€;, .++ G4, ) 
can be transformed to the matrix 

(C)incta) Sl 


by the procedure of repeatedly interchanging columns. Thus Problem 3-30 shows that 
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and the sign does not depend on det at all, but only on the interchanges needed. Thus 


(x) det A= S> +tai1...dign, 

i yesin 
and the signs depend on how 21,...,7 arise from 1,...,n. This finishes the proof of uniqueness 
of det! 


Now we turn to the harder proof of existence. A very logical thing to do would be to 
use the formula we have just obtained to define det A; then we would “just” have to verify 
the three required properties. However, it turns out that there is an alternate procedure that 
gives a different formula and one that is very useful for computations. We now describe this 
formula. 

Temporarily denote by A’, the column A; with first entry replaced by 0: 


aij 0 
0 a2; 

A; = . + - = ayje4 + Ay. 
0 Qnj 


The multilinearity of det, supposing it exists, gives 


det A = det(ay,é; + A)... dinér + A’) 


= asum of 2” terms, 


where each term is a determinant arising from det A by replacing various columns of A by 
a,;€,'s. But if as many as two columns are so replaced the resulting determinant is 0, thanks 
to its multilinearity and Problem 3-31. Thus (if it exists) 


det A= Sa; det(A\... é, ep Al) det(At 2A’). 
j=l 
, 
column j 


The last of these determinants is 0, as the first row of the matrix consists of 0’s and («) gives 
0. By interchanging the required 7 — 1 columns in the other matrices, we find 


det A = $((-1)? 4ay; det(é,4).... 4 Ais, -..A4). 
j=l 
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The matrix that appears in the 7" summand is 





1 O.... 0 0 aes 0 
0 a1 a2j-1 G2j4+1 +--+ Gn 
0 An Anj-1 Gnj+t +++ nn 


It’s not hard to see that its determinant must be the same as that of the (n — 1) x (n — 1) 
matrix that lies in rows and columns 2 through n. However, we don’t need to prove that at 
this stage, as we can just use the formula to give the definition of det by induction on n. 

Before doing that it is convenient to introduce a bit of notation. Given an n x n matrix 
A and two indices i and j, we obtain a new matrix from A by deleting its i** row and j'® 
column. We denote this matrix by A;,,;, and call it the minor for the i“ row and j‘* column. 
It is of course an (n — 1) x (n — 1) matrix. Now here is the crucial result. 


THEOREM. The uniqueness proof of the preceding theorem guarantees that only one deter- 
minant function for n x n matrices can exist. It does exist, and can be defined by induction 
on n from the starting casen = 1 (deta =a) and the inductive formula for n > 2, 


j=l 


In this formula the row index i is fixed, and can be any of the integers 1,...,n (the sum is 
independent of i). 


The important formula in the theorem has a name: it is called expansion of det A by 
the minors along the 7*” row. 


PROOF. What we must do is for any 1 <7 <n prove that det A as defined above satisfies 
the three properties listed in the preceding theorem. In doing this we are of course allowed to 
use the same properties for the determinants of (n — 1) x (n — 1) matrices. We consign the 
proofs of the multilinearity and normalization to the problems following this discussion, as 
they are relatively straightforward, and content ourselves with the alternating property about 
switching two adjacent columns. Let us then assume that B is the matrix which comes from A 
by switching columns k and k+1. Then if 7 4 k or k+1 the minors B;; and Aj; also differ from 
one another by a column switch, so that the inductive hypothesis implies det B;; = — det Aj;. 
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Thus only 7 = k and k + 1 matter: 


det B + det A = (—1)***by det By, + (—1)t* "1b, 444 det Bi pat 
+ (=1) "a det Ai, + (—1) tas ea det Aix41 
= (-1)' ainsi [det By, — det Ai nis] 
+ (-1)*t*ay, [— det Byyzi1 + det Aig - 





Aha! We have Bi, = Ajnii and By x41 = Aix. Thus det B+ det A = 0. 
QED 


PROBLEM 3-33. Prove by induction that det A as defined inductively is a linear 
function of each column of A. 


PROBLEM 3-34. Prove by induction that det A as defined inductively satisfies 
det J = 1. 


PROBLEM 3-35. Let A be “upper triangular” in the sense that a;; = 0 for alli > j. 
Prove that det A = aj1422...Ann. That is, 


Qi1 G12 G13... Gin 
0 a92 493... Gan 

det 0 0 433 +++ G3n = 011892..-Ann- 
0 0 O GQnn 


Do the same for lower triangular. 


One more property is needed in order to be able to compute determinants efficiently. This 
property states that all the properties involving column vectors and expansion along a row 
are valid also for the row vectors of the matrix and expansion along columns. Because of the 
uniqueness of the determinant, it will suffice to prove the following result. 


THEOREM. The determinant as defined above satisfies 
ROW MULTILINEAR: det A is a linear function of each row of A. 
ROW ALTERNATING: det A changes sign if two rows of A are interchanged. 
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PROOF. The row multilinearity is an immediate consequence of the formula (*) or of the 
formula for expansion of det A by the minors along the i‘" row. The latter formula actually 
displays det A as a linear function of the row vector (a;1, Qi2,.-., Qin). 


The row alternating property follows easily from the result of Problem 3-32, applied to 
rows instead of columns. Thus, assume that two rows of A are equal. If n = 2, det A = 0 
because of our explicit formula in this case. If n > 2, then expand det A by minors along a 
different row. The corresponding determinants of the minors Aj; are all zero (by induction on 
n) because each A;; has two rows equal. 

QED 


The proofs of the following two corollaries are now immediate: 


COROLLARY. det A‘ = det A. 
Here A‘ is the transpose of A, as defined in Problem 2-83. 


COROLLARY. Expansion by minors along the j" column: 


det A = So(-1) aij det Ae 
i=1 


If we had to work directly from the definition of determinants, calculating them would be 
utterly tedious for n > 4. We would essentially need to add together n! terms, each a product 
of n matrix entries. Fortunately, the properties lend themselves to very efficient calculations. 
The goal is frequently to convert A to a matrix for which a row or column is mostly 0. 


The major tool for doing this is the property that if a scalar multiple of a column (or 
row) is added to a different column (or row), then the determinant is unchanged. 
Such procedures are called elementary column (or row) operations. 
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For instance, 


x bb . © 
L. =L = Ss 
CG oe Oa 
2 0 -1 3 
0 4 3 -9 
1-1 -1 5 
= det 02 2 1 (added multiples of row 2 to rows 1 and 4) 
0 2 1. =F 
4 3 -9 
=-—det{2 2 1 (expanded by minors along column 1) 
2 1 =f 
22 21 —9 
=-—det|}0 0 1 (added multiples of column 3 to columns 1 and 2) 
16 15 —7 
= det = (expanded by minors along row 2) 
16 15 
ih av ; ae 
= 2-3det 8 5 (used linearity in columns 1 and 2) 
= 6(55 — 56) 
= 0; 


PROBLEM 3-36. Find and prove an expression for the n x n determinant 


0 0 0 1 

0 0 1 0 
det |: 

0 1 0 0 

1 0 0 
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PROBLEM 3-37. Given fixed numbers a, 0, c, let x, stand for the n x n “tridiagonal” 
determinant 


Find a formula for x, in terms of 7,_; and x,_2. (Blank spaces are all 0.) 


PROBLEM 3-38. Express the n x n tridiagonal determinant 


in Fibonacci terms. The Fibonacci numbers may be defined by Fp = 0, Fi, = 1, Fy, = 
| = Pits 


PROBLEM 3-39. Find and prove an expression for the n x n determinant 
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PROBLEM 3-40. Here is a famous determinant, called the Vandermonde determi- 
nant: prove that 
il i> a7 


il War we 
det 2 


PROBLEM 3-41. Calculate 


r + a,b, aby abs 
db 1 + dgby dgbs 


det 
And} Anbz And 


(Answer: \"~1(\ + ae b)) 


PROBLEM 3-42. When applied to n x n matrices, determinant can be regarded as 
a function from R” to R. What is 


Odet A 


t 
Oaj; 





G. Invertible matrices and Cramer’s rule 


Before entering into the main topic of this section, we give one more property of determi- 
nant. 


THEOREM. det(AB) = det A det B. 


PROOF. Here of course A and B are square matrices of the same size, say n x n. The i—j 
entry of the product AB is 
n 
>» Aidp;- 
k=1 
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Thus the j*" column of AB is the column vector 


a1 KDK; 
nm n 
ADK; 
k=1 . k=1 
AnkDkj 


where, as usual, A; is our notation for the k** column of A. We can now simply follow the 
procedure of p. 3-28. In the above formula rename the summation index k = i;, so we have 
by the multilinearity of det, 


det AB = S~ bi... dign det(Aa ... Ai). 
Here the 7,’s run from 1 to n. If they are not distinct, then the corresponding determinant is 
0. Thus the 2,;’s are distinct in all nonzero terms in the sum, and by column interchanges we 
have 
= +det A. 


In fact, the sign is the same as 


Therefore 


B1yaeosbr 


= det Adet B. 


QED 


REMARKS. This result is a rather astounding bit of multilinear algebra. To convince 
yourself of that, just write down what it says in the case n = 2. Amazingly, the identical 
technique leads to a vast generalization, the Cauchy-Binet determinant theorem, which we 
shall present in Section 11D. It is also amazing that there is an entirely different approach to 
proving this result, as we shall discover in our study of integration in Chapter 10. 


DEFINITION. A square matrix A is invertible if there exists a matrix B such that AB = 
BA=TI. The matrix B is called the inverse of A, and is written B = A. 
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PROBLEM 3-43. For the above definition of inverse to make sense we need to know 
that the matrix B is unique. In fact, prove more: if A has a right inverse B (AB = I) 
and a left inverse C (CA = J), then B=C. 

(HINT: CAB.) 


PROBLEM 3-44. Prove that if A and B are invertible, then AB is invertible and 


AB\*=B8 A. 


PROBLEM 3-45. Prove that in general 


(AB)! = BtAt, 


PROBLEM 3-46. Prove that A is invertible => A’ is invertible. Also prove that 


(A) = (At), 





Now we are almost ready to state the major result of this section. There’s one more concept 
we need, and that is the linear independence of points in R”. 





DEFINITION. The vectors 2, ¢®),...,2) in R” are linearly dependent if there exist 
real numbers cj, C,...,Cz, not all 0, such that 


In other words, the given vectors satisfy some nontrivial homogeneous linear relationship. The 
vectors are said to be linearly independent if they are not linearly dependent. 
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A good portion of elementary linear algebra is dedicated to proving that if #,..., 2 


are 
IR” 
LE 














linearly independent vectors in R” — notice that it’s the same n — then every vector in 
can be expressed as a unique linear combination of the given vectors. That is, for any 
R” there exist unique scalars c,,..., Cn, such that 

















PROBLEM 3-48. Prove that the scalars c,,...,¢, in the above equation are indeed 
unique. 











PROBLEM 3-49. Prove that any n + 1 vectors in R” are linearly dependent. 





PROBLEM 3-50. Prove that the column vectors of an upper triangular matrix (a;;) 
are linearly independent <= the diagonal entries a;; are all nonzero. 





DEFINITION. If «™,...,2™ are linearly independent vectors in R", they are said to be 
a basis for R”. 





Now for the result. 


THEOREM. Let A be ann x n matrix. Then the following conditions are equivalent: 


(1) det A £ 0. 


(2) A is invertible. 


(3) A has a right inverse. 


(4) A has a left inverse. 


(5) The columns of A are linearly independent. 


(6) The rows of A are linearly independent. 








PROOF. We first prove that (3)=>(1)=>(5)= (3). First, (3)= (1) because the matrix 
equation AB = I implies det Adet B = detI = 1, so detA # 0. To see that (1)= (5), 


sup 


pose the columns of A linearly dependent. Then one of them is a linear combination of the 
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others, as we know from Problem 3-47. Inserting the resulting equation into the corresponding 
column of A and using the linearity of det A with respect to that column leads to det A = 0. 
Now suppose (5) holds. Then every vector can be expressed as a linear combination of the 


columns A,,...,A, of A. In particular, each coordinate vector can be so expressed: 
nm 
é; = > day Ak 
k=1 
for certain unique scalars b;;,...,bnj. Now we define the matrix B = (b,;), and we realize the 


equations for é; simply assert that 
I= AB. 
a (3) is valid. 
So (1), (3), (5) are equivalent. Likewise, (1), (4), (6) are equivalent. We conclude that (1), 
(3), (4), (5), (6) are equivalent. Finally, if they all hold, then (3) and (4) imply A is invertible, 
thanks to Problem 3-43. The converse assertions that (2) == (3) and (4) are trivial. 
QED 


We also remark that in case A is invertible, the two matrices A and A~! commute. Though 
this is rather obvious, still it is quite a wonderful fact since we do not generally expect two 
matrices to commute. Whenever it happens that AB = BA, that is really worth our attention. 

This is certainly a terrific theorem! It displays in dramatic fashion the importance of the 
determinant. However, it does not show how we might go about computing A~!. There is a 
theorem that does this, and it goes by the name of Cramer’s rule. 

Assume that det A 4 0 and that B = A~!. Then we examine the equation AB = J. As 
we have seen, it can be written as the set of vector equations 


S > bag An = 5, l<je<n. 


Now replace the i** column of A by é; and calculate the determinant of the resulting matrix. 
First, expansion by minors along the i'* column produces 


det (A, aoa Ay ge Ays4 enaee An) = (—1)**9 det Ags 
On the other hand, the formula for é; yields 


det (A, fans Ay jeA i4+1- -y De; det(A : Aj-1ApAja4 ee An) 


= b, detté; avaet A;-1AjAiv1 sau An) 
= by det A. 
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We conclude that 
._ det Aj; 
bi; = ee eat 
det A 
Now that we have proved it, we record the statement: 


CRAMER’S RULE. Assume det A #4 0. Then A7! is given by the formula for its i — j 
entry, 


det Ag 


_1)ttI 
(-1) det A ~ 


Elegant as it is, this formula is rarely used for calculating inverses, on account of the effort 
required to evaluate determinants. However, the 2 x 2 case is especially appealing and easy 


to remember: 
a b\ 1 d —b 
Ca ~ ad—bc \-c a)’ 


COROLLARY. Assume det A 4 0 and b € R” is a column vector. Then the unique column 
vector « € R” which satisfies Ax = b is given by x = A~'b, and the i entry of x equals 























_ det (A, fare A;-10Aj44 re An) 
7 det A 


Xj 


PROBLEM 3-51. Prove the corollary. 


? 


The equation Ax = b, with x as the “unknown,” is of course the general system of n linear 
equations in n unknowns, for if we display the coordinates we have 


Guti =P Oye, = i, 


Ani) tees + Anntn = bn. 


We can conclude from our results that four conditions concerning this system of equations are 
equivalent: 
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PROBLEM 3-52. Given an n x n matrix A, prove that the following four conditions 
are equivalent: 


det A # 0. 


1 


( 
(2 
( 
( 


If x € R” is a column vector such that Ax = 0, then x = 0. 


3) For every b € R”, there exists 7 € R” such that Ax = b. 


) 
) 
) 
) 





4) For every b € R", there exists one and only one x € R” such that Ax = b. 


PROBLEM 3-53. The classical adjoint of a square matrix A is denoted adjA and is 
defined by prescribing its 7 — 7 entry to be 


As we have shown if det A 4 0, then A~! = adjA/ det A. Prove that in general 
AadjA = (det A)I. 


(HINT: write down the definition of the i — 7 entry of the product on the left side and 
use expansion by minors along the 7‘ row. 


PROBLEM 3-54. Prove that A and adjA commute. 


PROBLEM 3-55. Prove that 





adj(adjA) = (det A)"~?A. 





PROBLEM 3-56. Prove that the classical adjoint of an upper triangular matrix is 
also upper triangular. 
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PROBLEM 3-57. Write out explicitly 


-1 
a1, 412 443 

0 ag2 23 

0 0 033 


H. Recapitulation 
As we have become involved in a great deal of mathematics in our analysis of critical 


points, it seems good to me that we pause for a look at the forest. 


1. We are considering a function R” 4, R of class C?. 

















2. We assume Zp is a critical point for f : Vf(xo) = 0. 
3. We consider the Taylor expansion of f(x + y): 
1 nm 
f(zo + y) = f(%0) + 5 > D:D; f(xo)yiys + R. 


i,j=l 


4. We define the Hessian matrix 
H = (DD; f(%o)). 


5. We say the critical point is nondegenerate if det H 4 0. 


6. Our goal is now to show that in the nondegenerate case, the Hessian matrix reveals the 
local nature of the critical point. 


7. We have finished this program in dimension 2, where we learned that if the critical point 
is degenerate (det H = 0), we cannot detect the local behavior through H. 


What remains is the analysis of symmetric matrices like H and the associated behavior of the 
quadratic functions they generate. As this in itself is a fascinating and important topic, we 
devote the next chapter to this study. 
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PROBLEM 3-58. Although this problem is concerned with R”, you can analyze it 
directly without requiring the result of Chapter 4. This is a generalization of Problem 3— 


24 (see also Problem 2-60). Let a; > a2 > +--+ > Gy > 0 be constants, and define R” LR 
by 





























f(a) = (aya? +--+ + ana? )e ll”, 
a. Find all the critical points of f and determine the local nature of each one. 


b. Clearly, f attains a strict global minimum at x = 0. Show also that f attains a 
global maximum; is it strict? 


I. A little matrix calculus 


It is quite interesting to investigate the interplay between the algebra of matrices and 
calculus. Specifically we wish to analyze the operation of the inverse of a matrix. 

First we observe that the space of all n x n real matrices may be viewed as R” a fact we 
observed on p. 2-48. Since the determinant of A is a polynomial expression in the entries of 
A we conclude that det A is a C® function on R”’. The set of invertible matrices is the same 
as the set of matrices with nonzero determinant and is therefore an open subset of the set of 
all n x n matrices. 








DEFINITION. The general linear group of nxn matrices is the set of all nx n real invertible 
matrices. It is commonly denoted 


GL(n). 
This is to be regarded as an open subset of the space of all n x n real matrices. 
There is of course a very interesting function defined on GL(n), namely the function which 
maps a matrix A to its inverse A~!. Let’s name this function inv. Thus 


inv(A) = AT. 


Cramer’s rule displays A~! as a matrix whose entries are polynomial functions of A, divided 
by det A. Thus A7! is a C® function of A. That is, 


GL(n) 3 GL(n) 


is a C™ function. 
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PROBLEM 3-59. If the n x n matrix B is sufficiently small, J + B € GL(n). Prove 
that 
G43) =f=( +B) B. 


Then prove that 
(I+ B)'=I-B+(I+B)'B’. 


PROBLEM 3-60. Use the preceding problem to show that the differential of inv at I 
is the linear mapping which sends B to —B. In terms of directional derivatives, 


Dinv(I; B) = -B. 


PROBLEM 3-61. Let A € GL(n). Use the relation 
A+B=A(I+A'B) 


to prove that 
Dinv(A; B) =—A"BA™. 


In other words, 


ag +tB)'| =—-A'BATt. 


d 


t=0 





